AITopics | information-theoretic approach

Collaborating Authors

information-theoretic approach

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Information-theoretic Approach to Distribution Shifts

Neural Information Processing SystemsDec-24-2025, 12:06:19 GMT

Safely deploying machine learning models to the real world is often a challenging process. For example, models trained with data obtained from a specific geographic location tend to fail when queried with data obtained elsewhere, agents trained in a simulation can struggle to adapt when deployed in the real world or novel environments, and neural networks that are fit to a subset of the population might carry some selection bias into their decision process.In this work, we describe the problem of data shift from an information-theoretic perspective by (i) identifying and describing the different sources of error, (ii) comparing some of the most promising objectives explored in the recent domain generalization and fair classification literature. From our theoretical analysis and empirical evaluation, we conclude that the model selection procedure needs to be guided by careful considerations regarding the observed data, the factors used for correction, and the structure of the data-generating process.

distribution shift, information-theoretic approach, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Joint Entropy Search for Maximally-Informed Bayesian Optimization

Neural Information Processing SystemsAug-14-2025, 16:14:38 GMT

Entropy Search and Predictive Entropy Search both consider the entropy over the optimum in the input space, while the recent Max-value Entropy Search considers the entropy over the optimal value in the output space.

acquisition function, international conference, optimization, (11 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
North America > United States (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback

An Information-theoretic Approach to Distribution Shifts

Neural Information Processing SystemsJan-17-2025, 12:52:52 GMT

distribution shift, information-theoretic approach

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Generalization Bounds: Perspectives from Information Theory and PAC-Bayes

Hellström, Fredrik, Durisi, Giuseppe, Guedj, Benjamin, Raginsky, Maxim

arXiv.org Machine LearningSep-8-2023

A fundamental question in theoretical machine learning is generalization. Over the past decades, the PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning algorithms, and design new ones. Recently, it has garnered increased interest due to its potential applicability for a variety of learning algorithms, including deep neural networks. In parallel, an information-theoretic view of generalization has developed, wherein the relation between generalization and various information measures has been established. This framework is intimately connected to the PAC-Bayesian approach, and a number of results have been independently discovered in both strands. In this monograph, we highlight this strong connection and present a unified treatment of generalization. We present techniques and results that the two perspectives have in common, and discuss the approaches and interpretations that differ. In particular, we demonstrate how many proofs in the area share a modular structure, through which the underlying ideas can be intuited. We pay special attention to the conditional mutual information (CMI) framework; analytical studies of the information complexity of learning algorithms; and the application of the proposed methods to deep learning. This monograph is intended to provide a comprehensive introduction to information-theoretic generalization bounds and their connection to PAC-Bayes, serving as a foundation from which the most recent developments are accessible. It is aimed broadly towards researchers with an interest in generalization and theoretical machine learning.

artificial intelligence, information-theoretic generalization bound, machine learning, (18 more...)

arXiv.org Machine Learning

2309.04381

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > France > Île-de-France > Paris > Paris (0.14)
(51 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Government (0.67)
Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Explainable Empirical Risk Minimization

Jung, A.

arXiv.org Machine LearningSep-3-2020

The widespread use of modern machine learning methods in decision making crucially depends on their interpretability or explainability. The human users (decision makers) of machine learning methods are often not only interested in getting accurate predictions or projections. Rather, as a decision-maker, the user also needs a convincing answer (or explanation) to the question of why a particular prediction was delivered. Explainable machine learning might be a legal requirement when used for decision making with an immediate effect on the health of human beings. As an example consider the computer vision of a self-driving car whose predictions are used to decide if to stop the car. We have recently proposed an information-theoretic approach to construct personalized explanations for predictions obtained from ML. This method was model-agnostic and only required some training samples of the model to be explained along with a user feedback signal. This paper uses an information-theoretic measure for the quality of an explanation to learn predictors that are intrinsically explainable to a specific user. Our approach is not restricted to a particular hypothesis space, such as linear maps or shallow decision trees, whose predictor maps are considered as explainable by definition. Rather, we regularize an arbitrary hypothesis space using a personalized measure for the explainability of a particular predictor.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Machine Learning

2009.01492

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Finland (0.04)

Genre: Research Report (0.40)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

An Information-Theoretic Approach to Explainable Machine Learning

Jung, Alexander

arXiv.org Machine LearningMar-1-2020

A key obstacle to the successful deployment of machine learning (ML) methods to important application domains is the (lack of) explainability of predictions. Explainable ML is challenging since explanations must be tailored (personalized) to individual users with varying backgrounds. On one extreme, users can have received graduate level education in machine learning while on the other extreme, users might have no formal education in linear algebra. Linear regression with few features might be perfectly interpretable for the first group but must be considered a black-box for the latter. Using a simple probabilistic model for the predictions and user knowledge, we formalize explainable ML using information theory. Providing an explanation is then considered as the task of reducing the "surprise" incurred by a prediction. Moreover, the effect of an explanation is measured by the conditional mutual information between the explanation and prediction, given the user background.

explainable ml, explanation, prediction, (16 more...)

arXiv.org Machine Learning

2003.00484

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry:

Law (0.94)
Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Q-Search Trees: An Information-Theoretic Approach Towards Hierarchical Abstractions for Agents with Computational Limitations

#artificialintelligenceOct-6-2019, 14:26:48 GMT

In this paper, we develop a framework to obtain graph abstractions for decision-making by an agent where the abstractions emerge as a function of the agent's limited computational resources. We discuss the connection of the proposed approach with information-theoretic signal compression, and formulate a novel optimization problem to obtain tree-based abstractions as a function of the agent's computational resources. The structural properties of the new problem are discussed in detail, and two algorithmic approaches are proposed to obtain solutions to this optimization problem. We discuss the quality of, and prove relationships between, solutions obtained by the two proposed algorithms. The framework is demonstrated to generate a hierarchy of abstractions for a non-trivial environment.

computational limitation, hierarchical abstraction, information-theoretic approach, (4 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.40)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.40)

Add feedback

An Information-Theoretic Approach to Minimax Regret in Partial Monitoring

Lattimore, Tor, Szepesvari, Csaba

arXiv.org Machine LearningFeb-1-2019

We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary. We then generalise the information-theoretic tools of Russo and Van Roy (2016) for proving Bayesian regret bounds and combine them with the minimax theorem to derive minimax regret bounds for various partial monitoring settings. The highlight is a clean analysis of `non-degenerate easy' and `hard' finite partial monitoring, with new regret bounds that are independent of arbitrarily large game-dependent constants. The power of the generalised machinery is further demonstrated by proving that the minimax regret for k-armed adversarial bandits is at most sqrt{2kn}, improving on existing results by a factor of 2. Finally, we provide a simple analysis of the cops and robbers game, also improving best known constants.

information-theoretic approach, minimax regret, partial monitoring, (13 more...)

arXiv.org Machine Learning

1902.0047

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > Arizona > Maricopa County > Scottsdale (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On the Minimax Risk of Dictionary Learning

Jung, Alexander, Eldar, Yonina C., Görtz, Norbert

arXiv.org Machine LearningJul-20-2015

We consider the problem of learning a dictionary matrix from a number of observed signals, which are assumed to be generated via a linear model with a common underlying dictionary. In particular, we derive lower bounds on the minimum achievable worst case mean squared error (MSE), regardless of computational complexity of the dictionary learning (DL) schemes. By casting DL as a classical (or frequentist) estimation problem, the lower bounds on the worst case MSE are derived by following an established information-theoretic approach to minimax estimation. The main conceptual contribution of this paper is the adaption of the information-theoretic approach to minimax estimation for the DL problem in order to derive lower bounds on the worst case MSE of any DL scheme. We derive three different lower bounds applying to different generative models for the observed signals. The first bound applies to a wide range of models, it only requires the existence of a covariance matrix of the (unknown) underlying coefficient vector. By specializing this bound to the case of sparse coefficient distributions, and assuming the true dictionary satisfies the restricted isometry property, we obtain a lower bound on the worst case MSE of DL schemes in terms of a signal to noise ratio (SNR). The third bound applies to a more restrictive subclass of coefficient distributions by requiring the non-zero coefficients to be Gaussian. While, compared with the previous two bounds, the applicability of this final bound is the most limited it is the tightest of the three bounds in the low SNR regime.

artificial intelligence, machine learning, minimax risk, (16 more...)

arXiv.org Machine Learning

1507.05498

Country:

North America > United States (0.93)
Europe (0.67)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Random Forests Can Hash

Qiu, Qiang, Sapiro, Guillermo, Bronstein, Alex

arXiv.org Machine LearningApr-16-2015

Hash codes are a very efficient data representation needed to be able to cope with the ever growing amounts of data. We introduce a random forest semantic hashing scheme with information-theoretic code aggregation, showing for the first time how random forest, a technique that together with deep learning have shown spectacular results in classification, can also be extended to large-scale retrieval. Traditional random forest fails to enforce the consistency of hashes generated from each tree for the same class data, i.e., to preserve the underlying similarity, and it also lacks a principled way for code aggregation across trees. We start with a simple hashing scheme, where independently trained random trees in a forest are acting as hashing functions. We the propose a subspace model as the splitting function, and show that it enforces the hash consistency in a tree for data from the same class. We also introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code, producing a near-optimal unique hash for each class. Experiments on large-scale public datasets are presented, showing that the proposed approach significantly outperforms state-of-the-art hashing methods for retrieval tasks.

artificial intelligence, machine learning, random forest, (16 more...)

arXiv.org Machine Learning

1412.5083

Country: North America > Canada (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback